All Questions
Tagged with class-imbalancetraining
15 questions
0votes
1answer
287views
Aren't balanced data sets important in regression?
Why is it that the necessity for balanced data sets is (almost) always exclusively mentioned in the context of classification but not of regression?
1vote
0answers
91views
Is it right method to remove instances that are hard to predict before train test split?
In a binary classification problem, I have a slightly unbalanced medical dataset with class distribution: 0:5600, 1:1500 0 without a problem and 1 with a problem. I tried many pipelines, automls, and ...
0votes
0answers
287views
Train/ Test split on small dataset along with SMOTE
I have a binary classification imbalanced dataset with 1000 samples ( 15% of class 1, 85% of the rest). My main goal is to build a robust classifier using the following approach. Wanted to know if ...
1vote
1answer
975views
Test set larger than train set [closed]
There is a two class dataset with 1121 values in total, having 230 from same class and 891 from the other class. The training set is choosen as 230+230=460 from both classes and the test set as the ...
1vote
1answer
30views
Many questions training unbalanced and duplicated data
I'm a DS student. I have like 30.000 of bank statements, all labeled with a specific category(cat1, cat2, ...). With that data I'm trying to train a classification model but I found several problems: ...
-1votes
1answer
80views
using average precision as metric for imbalanced problem (learning curve example) [closed]
I have an imbalanced problem (2% target class) and therefore need an appropriate metric - so I chose average_precision. My code: ...
1vote
1answer
43views
[under/over]-sampling teaches model the wrong distribution?
TLDR: Will under/oversampling during the training phase teach the model the wrong distribution and adversely affect accuracy? Let us assume you want to train a classifier to differentiate between ...
3votes
1answer
2kviews
While downsampling training data should we also downsample the validation data or retain validation split as it is?
I am dealing with class imbalance problem. In this case, I am down sampling the majority class lables in the training set. Among training, validation and test splits, the majority class in training ...
0votes
1answer
1kviews
splitting into train test by train_test_split of float values?
How to split into train test by train_test_split of float values ? I used LabelEncoder but I have about 300K lines and when I used the cross_val I saw ...
6votes
2answers
6kviews
Resampling for imbalaced datasets: should testing set also be resampled?
Apologies for what is probably a basic question but I have not been able to find a definitive answer either in the literature or in the Internet. When dealing with an imbalanced dataset one possible ...
6votes
2answers
595views
Why real-world output of my classifier has similar label ratio to training data?
I trained a neural network on balanced dataset, and it has good accuracy ~85%. But in real world positives appear in about 10% of the cases or less. When I test network on set with real world ...
2votes
2answers
123views
oversampling data with subclass
Oversampling of under-represented data is a way to combat class imbalance. For example, if we have a training data set with 100 data points of class A and 1000 data points of class B, we can over ...
1vote
3answers
4kviews
Downsampling and class ratios
My target variable is whether an application is accepted or not. It is a highly imbalanced target with 98.5% of applications accepted. I am unclear about the concept of downsampling. If I were to ...
7votes
2answers
3kviews
How to fix class imbalance in training sample?
I was very recently asked in a job interview about solutions to fix an imbalance of classes in the training dataset. Let's focus on a binary classification case. I offered two solutions: oversampling ...